High-performance FFT implementation on the BOPS ManArray parallel DSP
نویسندگان
چکیده
We present a high performance implementation of the FFT algorithm on the BOPS ManArray parallel DSP processor. The ManArray we consider for this application consists of an array controller and 2 to 4 fully interconnected processing elements. To expose the parallelism inherent to an FFT algorithm we use a factorization of the DFT matrix in Kronecker products, permutation and diagonal matrices. Our implementation utilizes the multiple levels of parallelism that are available on the ManArray. We use the special multiply complex instruction, that calculates the product of two complex 32-bit fixed point numbers in 2 cycles (pipelinable). Instruction level parallelism is exploited via the indirect Very Long Instruction Word (iVLIW). With an iVLIW, in the same cycle a complex number is read from memory, another complex number is written to memory, a complex multiplication starts and another finishes, two complex additions or subtractions are done and a complex number is exchanged with another processing element. Multiple local FFTs are executed in Single Instruction Multiple Data (SIMD) mode, and to avoid a costly data transposition we execute distributed FFTs in Synchronous Multiple Instructions Multiple Data (SMIMD) mode.
منابع مشابه
H.263 Video Encoder Implementation on Bops Manta Processor
An H.263 video encoder implementation without negotiable coding options on BOPS Manta DSP is presented. First, the ManArray architecture is described. Then, partitioning between the multiple processing elements (PEs) and implementation issues of the video encoder are discussed. Based on the results of this experiment, we discuss the suitability of this parallel implementation platform for video...
متن کاملThe ManArray( Embedded Processor Architecture
The BOPS ManArray architecture is presented as a coprocessor platform for the embedded processor domain, consisting of scalable design points. As an array processor, a single architecture definition and tool set supports multiple configurations of processing elements (PEs) from low end single PE to large arrays of PEs. The ManArray selectable parallelism architecture mixes control-oriented oper...
متن کاملEfficient implementation of video post-processing algorithms on the BOPS parallel architecture
Deblocking and deringing are two video post-processing techniques largely used to remove coding artifacts and improve the visual quality when rendering low bit rate coded video. The algorithms used to achieve these tasks are computationally intensive and usually require high speed processors to be able to run in real time. Efficient implementations of signal adaptive filters for video post-proc...
متن کاملManArray Processor Interconnection Network: An Introduction
The present paper introduces the new interconnection network of the BOPS ManArray family of available core products. To form a ManArray network, the processing elements are completely connected within clusters and communicate with members of only two other clusters thereby reducing signal fan-out and wiring density. With this simple network, single-step communications between a hypercube and it...
متن کاملImplementation of Low-Memory Reference FFT on Digital Signal Processor
Problem statement: In order to improve and implement Fast Fourier Transform (FFT), in general, an efficient parallel form in digital signal processor is necessary. The butterfly structure is an important role in FFT, because its symmetry form is suitable for hardware implementation. Although it can perform a symmetric structure, the performance will be reduced under the data-dependent flow char...
متن کامل